With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
自我训练在半监督学习中表现出巨大的潜力。它的核心思想是使用在标记数据上学习的模型来生成未标记样本的伪标签,然后自我教学。为了获得有效的监督,主动尝试通常会采用动量老师进行伪标签的预测,但要观察确认偏见问题,在这种情况下,错误的预测可能会提供错误的监督信号并在培训过程中积累。这种缺点的主要原因是,现行的自我训练框架充当以前的知识指导当前状态,因为老师仅与过去的学生更新。为了减轻这个问题,我们提出了一种新颖的自我训练策略,该策略使模型可以从未来学习。具体而言,在每个培训步骤中,我们都会首先优化学生(即,在不将其应用于模型权重的情况下缓存梯度),然后用虚拟未来的学生更新老师,最后要求老师为伪标记生产伪标签目前的学生作为指导。这样,我们设法提高了伪标签的质量,从而提高了性能。我们还通过深入(FST-D)和广泛(FST-W)窥视未来,开发了我们未来自我训练(FST)框架的两个变体。将无监督的域自适应语义分割和半监督语义分割的任务作为实例,我们在广泛的环境下实验表明了我们方法的有效性和优越性。代码将公开可用。
translated by 谷歌翻译
Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models),the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. 0.157, MOS: 3.80 vs. 3.61). The generated audio samples (https://speechresearch.github.io/binauralgrad) and code (https://github.com/microsoft/NeuralSpeech/tree/master/BinauralGrad) are available online.
translated by 谷歌翻译
在这项工作中,我们提出了一个置换不变的语言模型Symphonynet,作为象征性交响音乐生成的解决方案。我们建议使用基于变压器的自动回归语言模型具有特定的3-D位置嵌入的新型多通道可重复的多磁场(MMR)表示,并模拟音乐序列。为了克服长度溢出在建模超长的交响令牌时,我们还提出了一对经过修改的字节对编码算法(音乐bpe)用于音乐令牌,并引入了一种新颖的线性变压器解码器架构作为骨干。同时,我们通过从输入中掩盖仪器信息来训练解码器将自动编排作为联合任务学习。我们还引入了一个大规模的符号交响数据集,以进行交响曲生成研究的发展。经验结果表明,所提出的方法可以产生连贯,新颖,复杂和和谐的交响曲,作为多轨多训练符号音乐生成的先驱解决方案。
translated by 谷歌翻译
新闻事实检查的一个重要挑战是对现有事实核对的有效传播。反过来,这需要可靠的方法来检测先前事实检查的主张。在本文中,我们专注于自动寻找在社交媒体帖子(推文)中提出的索赔的现有事实检查。我们使用多语言变压器模型(例如XLM-Roberta和多语言嵌入者,例如Labse and Sbert)进行单语(仅英语),多语言(西班牙语,葡萄牙语)和跨语性(印度英语)设置进行分类和检索实验。我们提供了四个语言对的“匹配”分类(平均准确性86%)的有希望的结果。我们还发现,在单语实验中,BM25基线的表现胜过或与最先进的多语言嵌入模型相提并论。我们在以不同的语言来解决此问题的同时,强调和讨论NLP挑战,并介绍了一个新颖的事实检查数据集和相应的推文,以供将来的研究。
translated by 谷歌翻译
在本文中,我们通过随机搜索方向的Kiefer-Wolfowitz算法调查了随机优化问题模型参数的统计参数问题。我们首先介绍了Polyak-ruppert-veriving型Kiefer-Wolfowitz(AKW)估计器的渐近分布,其渐近协方差矩阵取决于函数查询复杂性和搜索方向的分布。分布结果反映了统计效率与函数查询复杂性之间的权衡。我们进一步分析了随机搜索方向的选择来最小化渐变协方差矩阵,并得出结论,最佳搜索方向取决于相对于Fisher信息矩阵的不同摘要统计的最优标准。根据渐近分布结果,我们通过提供两个有效置信区间的结构进行一次通过统计推理。我们提供了验证我们的理论结果的数值实验,并通过程序的实际效果。
translated by 谷歌翻译
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
translated by 谷歌翻译
Federated learning has recently been applied to recommendation systems to protect user privacy. In federated learning settings, recommendation systems can train recommendation models only collecting the intermediate parameters instead of the real user data, which greatly enhances the user privacy. Beside, federated recommendation systems enable to collaborate with other data platforms to improve recommended model performance while meeting the regulation and privacy constraints. However, federated recommendation systems faces many new challenges such as privacy, security, heterogeneity and communication costs. While significant research has been conducted in these areas, gaps in the surveying literature still exist. In this survey, we-(1) summarize some common privacy mechanisms used in federated recommendation systems and discuss the advantages and limitations of each mechanism; (2) review some robust aggregation strategies and several novel attacks against security; (3) summarize some approaches to address heterogeneity and communication costs problems; (4)introduce some open source platforms that can be used to build federated recommendation systems; (5) present some prospective research directions in the future. This survey can guide researchers and practitioners understand the research progress in these areas.
translated by 谷歌翻译
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method -- note that this is not true for SGD or SLBFGS -- and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.
translated by 谷歌翻译